Parsing, Word Associations and Typical Predicate-Argument Relations

نویسندگان

  • Kenneth Ward Church
  • William A. Gale
  • Patrick Hanks
  • Donald Hindle
چکیده

There are a number of coUocational constraints in natural languages that ought to play a more important role in natural language parsers. Thus, for example, it is hard for most parsers to take advantage of the fact that wine is typically drunk, produced, and sold, but (probably) not pruned. So too, it is hard for a parser to know which verbs go with which prepositions (e.g., set up) and which nouns fit together to form compound noun phrases (e.g., computer programmer). This paper will attempt to show that many of these types of concerns can be addressed with syntactic methods (symbol pushing), and need not require explicit semantic interpretation. We have found that it is possible to identify many of these interesting co-occurrence relations by computing simple summary statistics over millions of words of text. This paper will summarize a number of experiments carried out by various subsets of the authors over the last few years. The term collocation will be used quite broadly to include constraints on SVO (subject verb object) triples, phrasal verbs, compound noun phrases, and psycholinguistic notions of word association (e.g., doctor~nurse). 1. Mutual Information Church and Hanks (1989) discussed the use of the mutual information statistic in order to identify a variety of interesting linguistic phenomena, ranging from semantic relations of the doctor/nurse type (content word/content word) to lexico-syntacfic co-occurrence constraints between verbs and prepositions (content word/function word). Mutual information, l(x;y), compares the probability of observing word x and word y together (the joint probability) with the probabilities of observing x and y independently (chance). l(x;y) -= log 2 P(x,y) e(x) e(y) If there is a genuine association between x and y, then the joint probability P(x,y) will he much larger than chance P(x) P(y), and consequently l(x;y) >> 0, as illustrated in the table below. If there is no interesting relationship between x and y, then P(x,y) = P(x) P(y), and thus, I(x;y) = 0. If x and y are in complementary distribution, then P(x,y) will be much less than P(x) P(y), forcing l(x;y) << 0. Word probabilities, P(x) and P(y), are estimated by counting the number of observations of x and y in a corpus, f (x) and f(y), and normalizing by N, the size of the corpus. Joint probabilities, P(x,y), are estimated by counting the number of times that x is followed by y in a window of w words, fw(x,y), and normalizing by N (w 1). 2 1. The paper was previously presented at International Workshop on Parsing Technologies, CMU, 1989. 2. The window size parameter allows us to look at different scales. Smaller window sizes will identify fixed expressions (idioms), noun phrases, and other relations that hold over short ranges; larger window sizes will highlight semantic concepts and other relationships that hold over larger scales.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Linguistic Analysis for the Accurate Identification of Predicate-Argument Relations

This paper evaluates the accuracy of HPSG parsing in terms of the identification of predicate-argument relations. We could directly compare the output of HPSG parsing with PropBank annotations, by assuming a unique mapping from HPSG semantic representation into PropBank annotation. Even though PropBank was not used for the training of a disambiguation model, an HPSG parser achieved the accuracy...

متن کامل

From dictionary to corpus to self-organizing dictionary: learning valency associations in the face of variation and change

ing over specific lexically-governed particles and prepositions and specific predicate selectional preferences, but including some `derived' / `alternant' semi-productive, and therefore only semipredictable, bounded dependency constructions, such as particle or dative movement, there are at least 163 valency frames associated with verbal predicates in (current) English (Briscoe, 2000). In this ...

متن کامل

برچسب‌زنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه

Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...

متن کامل

Word-based Japanese typed dependency parsing with grammatical function analysis

We present a novel scheme for wordbased Japanese typed dependency parser which integrates syntactic structure analysis and grammatical function analysis such as predicate-argument structure analysis. Compared to bunsetsu-based dependency parsing, which is predominantly used in Japanese NLP, it provides a natural way of extracting syntactic constituents, which is useful for downstream applicatio...

متن کامل

Semantic Mapping Using Automatic Word Alignment and Semantic Role Labeling

To facilitate the application of semantics in statistical machine translation, we propose a broad-coverage predicate-argument structure mapping technique using automated resources. Our approach utilizes automatic syntactic and semantic parsers to generate Chinese-English predicate-argument structures. The system produced a many-to-many argument mapping for all PropBank argument types by computi...

متن کامل

Parsing with Generative Models of Predicate-Argument Structure

The model used by the CCG parser of Hockenmaier and Steedman (2002b) would fail to capture the correct bilexical dependencies in a language with freer word order, such as Dutch. This paper argues that probabilistic parsers should therefore model the dependencies in the predicate-argument structure, as in the model of Clark et al. (2002), and defines a generative model for CCG derivations that c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1989